Tweet, text and sentiment analysis

- This project demonstrates a tweets analysis with tweets posted between October, 2021 and August, 2022 about Márki-Zay Péter who was not only the opponent of Orban Viktor in the Hungarian election in 2022 but also a controversial public member who has divided hungarians.

- Tweets were fetched from Twitter API with keyword search option.

( - Twitter API offers a keyword search option ensuring that get the most relevant tweets.)

1. Exploratory data analysis

1.1 Introduction of the dataset

columns: tweet_id, tweet_full_text, tweet_created_at, tweet_retweet_count, tweet_favorite_count, tweet_user_id, tweet_hashtags, user_name, user_screen_name, user_location, user_description, user_created_at, user_followers_count, user_friends_count, user_favourites_count, user_statuses_count, user_listed_count, tweet_language, original_tweet_id

1.2 Number of tweets and retweets

1.3 Created at before and after the election

1.4 Tweets timeline

1.4.1 All tweets

1.4.2 Only tweets (without retweets)

1.5 Tweets language

1.6 Used hashtags

1.7 Tweet location

1.7.1 Location exists for tweets and retweets

1.7.2 Location exists for only tweets

1.7.3 Detected country list

- Tweeters' country was detected from the user_location column by using a custom detection script

1.7.4 Top 20 countries of all tweets

1.7.5 Top 20 countries of only tweets

1.7.6 Top 50 original user locations of all tweets

1.7.7 Sankey chart with locations

- Where retweeters located who reposted tweets made from TOP locations

1.8 Tweets of Media and news company accounts

1.9 Retweets

1.9.1 Retweets count based on language

1.9.2 Retweets count based on date

1.9.3 Retweets count based on month

1.9.4 TOP 20 most retweeted tweets with text

2. Text analysis

2.1 Distribution of word counts in tweet text

2.2 Wordclouds

2.2.1 Wordcloud of english tweets

2.2.2 Wordcloud of hungarian tweets

3. Sentiment analysis

a) tweets with "und" language were detected with spacy, LanguageDetector library

b) non-english tweets were translated to english by Google with translators library

c) compound scores of english texts were detected by vaderSentiment script

(read further details on https://github.com/cjhutto/vaderSentiment)

d) using sentiment categories:

1 - 0.75: very positive

0.75 - 0.5: positive

0.5- -0.5: neutral

-0.5 - -0.75: negative

-0.75 - -1: very negative

3.1 Distribution of compound scores

3.2 Retweet statistics based on sentiments

3.3 Timeline of sentiments

3.4 Locations and sentiments

3.5 Languages and sentiments